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1 Introduction 

We start with definitions given by Plotkin, Shmoys, and Tardos [^6|. Given 
A £ ]R mxn , b G R m and a polytope P C IR™, the fractional packing problem is 
to find an x G P such that Ax < b if such an a: exists. An e- approximate solution 
to this problem is an x £ P such that Ax < (1 + e)b. An e-relaxed decision 
procedure always finds an e-approximate solution if an exact solution exists. 

A Dantzig- Wolfe-type algorithm for a fractional packing problem x £ 
P, Ax < b is an algorithm that accesses P only by queries to P of the following 
form: "given a vector c, what is an x £ P minimizing c • x?" 

There are Dantzig- Wolfe-type e-relaxed decision procedures (e.g. that 
require 0(pe~ 2 log to) queries to P, where p is the width of the problem instance, 
defined as follows: 

p(A, P) = max max Ai ■ xjbi 

where Ai denotes the i th row of A. 

In this paper we give a natural probability distribution of fractional packing 
instances such that, for an instance chosen at random, with probability 1 — o(l) 
any Dantzig- Wolfe-type e-relaxed procedure must make at least f2(pe~ 2 log to) 
queries to P. This lower bound matches the aforementioned upper bound, pro- 
viding evidence that the unfortunate linear dependence of the running times of 
these algorithms on the width and on e~ 2 is an inherent aspect of the Dantzig- 
Wolfe approach. 

The specific probability distribution we study here is as follows. Given to and 
p, let A be a random {0, l}-matrix with to rows and n — \fm columns, where 
each entry of A has probability 1/p of being 1. Let P be the n-simplex, and let 
b be the to- vector whose every entry is some v, where v is as small as possible 
so that Ax < b for some x £ P. 

The class of Dantzig- Wolfe-type algorithms encompasses algorithms and al- 
gorithmic methods that have been actively studied since the 1950's through the 
current time, including: 
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— an algorithm by Ford and Fulkerson for multicommodity flow 

— Dantzig- Wolfe decomposition (generalized linear programming 

— Benders' decomposition ||, 

— the Lagrangean relaxation method developed by Held and Karp and applied 
to obtaining lower bounds for the traveling salesman problem [l^] , 

— the multicommodity flow approximation algorithms of Shahrokhi and Mat- 
ula Q, of Klein et al. and of Leighton et al. ]l5| ], 

— the covering and packing approximation algorithms of Plotkin, Shmoys, and 
Tardos |l6[ ] and the approximation algorithms of Grigoriadis and Khachiyan j^] 
for block-angular convex programs, and many subsequent works (e.g. ]20|, pl). 

In a later section we discuss some of the history of the above algorithms and 
methods and how they relate to the fractional packing problem studied here. 

To prove the lower bound we use a probabilistic, discrepancy-theory argu- 
ment to characterize the values of random to x s zero-sum games when s is much 
smaller than to. From the point of view proposed in [ p0[ , where fractional pack- 
ing algorithms are derived using randomized rounding (and in particular the 
Chernoff bound), the intuition for the lower bound here is that it comes from 
the fact that the Chernoff bound is essentially tight. 

Some of the multicommodity flow algorithms, and subsequently the algo- 
rithms of Plotkin, Shmoys, Tardos and of Grigoriadis and Khachiyan, use a more 
general model than the one described above. This model assumes the polytope 
P is the cross-product P = P 1 x • • • x P k of k polytopes. In this model, each 
iteration involves optimizing a linear function over one of the polytopes P l . 
It is straightforward to extend the our lower bound to this model by making 
A block-diagonal, thus forcing each subproblem to be independently solved. In 
this general case, the lower bound shows that the number of iterations must be 
/2(e _2 (^ i pi) log to) , where pi is the width of P l . This lower bound is also tight 
within a constant factor, as it matches the upper bounds of Plotkin, Shmoys, 
and Tardos. 

Previous Lower Bounds. In 1977, Khachiyan proved an X2(e _1 ) lower 
bound on the number of iterations to achieve an error of e 

In 1994, Grigoriadis and Khachiyan proved an fl(m) lower bound on the 
number of iterations to achieve a relative error of e = 1. They did not consider 
the dependence of the number of iterations on e for smaller values of e. 

Freund and Schapire Q , in an independent work in the context of learning 
theory, prove a lower bound on the net "regret" of any adaptive strategy for 
playing repeated zero-sum games against an adversary. This result is related to, 
but different from, the result proved here. 

2 Proof of Main Result 

For any TO-row n-column matrix A, define the value of A (considered as a two- 
player zero-sum matrix game) to be 

V{A) = min max Ai ■ x 

x l<i<m 




Ill 



where Ai denotes the ith row of A and x ranges over the n-vectors with non- 
negative entries summing to 1. 

Theorem 1. For m e IN, n — &(m°- 5 ), and p e (0, 1/2), let A be a random 
{0, 1} m x n matrix with i.i.d. entries, each being 1 with probability p. Let e > 0. 
With probability 1 — o(l), 

- V(A) = Q{p), and 

— for s < min{(lnm)/(pe 2 ), m°- 5_<5 } (where 5 > is fixed), every m x s sub- 
matrix B of A satisfies 

V{B) > (l + ce)V(A) 

where c is a constant depending on 5. 

Our main result follows as a corollary. 

Corollary 1. Letm G IN, p > 2, ande > be given such that pe^ 2 = O(m 05 ~ s ) 
for some constant S > 0. 

For p = l/p, and n = to ' 5 , let A be a random {0, 1} to x n matrix as in the 
theorem. Let b denote the m-element vector whose every element is V(A). Let 
F = {i£ R" : x > 0, J2i x i = 1} ^ e ^ e n-simplex. 

Then with probability 1 — o(l), the fractional packing problem instance x £ 
P,Ax < b has width O(p), and any Dantzig- Wolfe-type e-relaxed decision pro- 
cedure requires at least £2(pe~ 2 \ogm) queries to P when given the instance as 
input. 

Assuming the theorem for a moment, we prove the corollary. Suppose that 
the matrix A indeed has the two properties that hold with probability 1 — o(l) 
according to the theorem. It follows from the definition of V(A) that there exists 
x* € P n such that Ax* < b. That is, there exists a (non-approximate) solution 
to the fractional packing problem. 

To bound the width, let x be any vector in P. By definition of P and A, for 
any row Aj of A we have Aj ■ x < 1. On the other hand, from the theorem we 
know that V(A) = Q{jp) = 0(1/ p). Since bj = V(A), it follows that Aj ■ x/bj is 
0{p). Since this is true for every j and x £ P, this bounds the width. 

Now consider any Dantzig- Wolfe-type e-relaxed decision procedure. Suppose 
for a contradiction that it makes no more than s < p(ce)~ 2 lnm calls to the 
oracle that optimizes over P. In each of these calls, the oracle returns a vertex 
of P, i.e. a vector of the form 

(0,0,. ..,0,1,0,. ..,0,0) 

Let S be the set of vertices returned, and let P(S) be the convex hull of these 
vertices. Every vector in P(S) has at most s non-zero entries, for its only non- 
zero entries can occur in positions for which there is a vector in S having a 1 in 
that position. Hence, by the theorem with e = e/c, there is no vector x e P(S) 
that satisfies Ax < (1 + e)b. 
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Consider running the same algorithm on the fractional packing problem Ax < 
b, x £ P(S), i.e. with P(S) replacing P. The procedure makes all the same queries 
to P as before, and receives all the same answers, and hence must give the 
same output, namely that an e-approximate solution exists. This is an incorrect 
output, which contradicts the definition of a relaxed decision procedure. 



3 Proof of Theorem [I] 

For any m-row n-column matrix A, define the value of A (considered as a two- 
player zero-sum matrix game) to be 

V(A) = min max Aj • x 

x l<i<m 

where Ai denotes the ith row of A and x ranges over the n-vectors with non- 
negative entries summing to 1. 

Before we give the proof of Theorem 1, we introduce some simple tools for 
reasoning about V(X) for a random {0, 1} matrix X. 

By the definition of V , V(X) is at most the maximum, over all rows, of the 
average of the row's entries. Suppose each entry in X is 1 with probability q, and 
within any row of X the entries are independent. Then for any 8 with < 8 < 1, a 
standard Chernoff bound implies that the probability that a given row's average 
exceeds (1 + 8)q is exp(— 0(8 2 qnx)) 1 where nx is the number of columns of 
X. Thus, by a naive union bound Pr[V(X) > (1 + 8)q] < mx exp(—0(8 2 qnx)) 
where mx is the number of rows of X. For convenience we rewrite this bound 
as follows. For any q £ [0, 1] and f3 e (0, 1], assuming mx/ (3 — > oo, 



Pr[V(X) > (1 + 8)q] = o(J3) for some 8 = 
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We use an analogous lower bound on V(X). By von Neumann's Min-Max The- 
orem 

V(X) = max min X[ ■ y 

V i 

(where X' denotes the transpose of X). Thus, reasoning similarly, if within any 
column of X (instead of any row) the entries are independent, 



Vr[V{X) < (1 - 8)q] = o{p) for some 8 = ( M™x/P) j ^ ^ 
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assuming nx/P ~> oo. We will refer to ([!]) and (^|) as the naive upper and lower 
bounds on V(X), respectively. 
Proof of Theorem 1. 

The naive lower bound to V(A) shows that 



Pr[V(A) < p(l - 5q)] = o(l) for some S = O I J— ) = o(l). (3) 
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Thus, V[A) > Q{p) with probability 1 - o(l). 

Let s = min{(lnm)/(pe 2 ), m 0,5_A }. Assume without generality that s = 
(lnm)/(pe 2 ) (by increasing e if necessary). 

We will show that with probability 1 — o(l) any mx s submatrix B of A has 
value 

V(B) > (1 + ce)V(A) (4) 

The definition of value implies that V(B') > V(B) for any mx s' submatrix B' 
of B (where s' < s). Thus we obtain (Q) for such submatrices B' as well. 

For any of the m rows of B, the expected value of the average of the s entries 
is p. We will show at least r — s 2 In n of the rows have a higher than average 
number of ones and by focusing on these rows we will show that V(B) is likely 
to be significantly higher than V(A). 

For appropriately chosen S\ , the probability that a given row of B has at least 
i = (1 + Si)ps ones is at least ©/(l - p) s ~ e = exp(-0(<5j>s)). (That is, the 
Chernoff bound is essentially tight here up to constant factors in the exponent.) 
Call any such row good and let G denote the number of good rows. In particular 
choosing some 



the probability that any given row is good is at least 2r/m and the expectation 
of G is at least 2r. Since G is a sum of independent random {0, 1} random 
variables, Pr[G < r] < exp(— r/8). 

By the choice of r, this is o(l/n s ), so with probability 1 — o(l/n s ), B has at 
least r good rows. 

Suppose this is indeed the case and select any r good rows. Let C be the rxs 
submatrix of B formed by the chosen rows. In any column of C, the entries are 
independent and by symmetry each has probability at least p(l + Si) of being 1. 
Applying the naive lower bound (0) to V(C), we find 



Pr[V(C) < p{\ + Si){l - 5 2 )\ = o{l/n s ) for some S 2 = O U ^Tj ■ ( 5 ) 

By the choice of r, S 2 = o(<5i). Thus (1 + 6^(1 - 5 2 ) = 1 + f2(S 1 ). Since V(B) > 
V(C), we find that, for any m x s submatrix B, V(B) > p(l + ,!7(<5i)) with 
probability 1 — o(l/n s ). 

Since there are at most (") < n s distinct m x s submatrices B of A, the 
probability that all of them have value p(l + f?(Si)) is 1 — o(l). Finally, applying 
the naive upper bound to V(A) shows that 



Pr[V(A) > p{\ + 5 3 )] = o(l) for some 8 3 = O j . (6) 

Since S3 = o(5i), the result follows. □ 
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4 Historical Discussion 

Historically, there are three lines of research within what we might call the 
Dantzig- Wolfe model. One line of work began with a method proposed by Ford 
and Fulkerson for computing multicommodity flow. Dantzig and Wolfe noticed 
that this method was not specific to multicommodity flow; they suggested de- 
composing an arbitrary linear program into two sets of constraints, writing it 
as 



and solving the linear program by an iterative procedure: each iteration in- 
volves optimizing over the polytope P. This approach, now called Dantzig- Wolfe 
decomposition, is especially useful when P can be written as a cross-product 
Pi x • • • x Pk , for in this case minimization over P can be accomplished by 
minimizing separately over each Pi. Often, for example, distinct Pj's constrain 
disjoint subsets of variables. In practice, this method tends to require many it- 
erations to obtain a solution with value optimum or nearly optimum, often too 
many to be useful. 

Lagrangean Relaxation 

A second line of research is represented by the work of Held and Karp |], |h]] . In 
1970 they proposed a method for estimating the minimum cost of a traveling- 
salesman tour. Their method was based on the concept of a 1-tree, which is a 
slight variant of a spanning tree. They proposed two ways to calculate this esti- 
mate; one involved formulating the estimate as the solution to the mathematical 
program 



where P is the polytope whose vertices are the 1-trees. They suggested an it- 
erative method to find an optimal or near-optimal solution: While they given 
some initial assignment to u, find a minimum-cost 1-tree with respect to the 
edge-costs c — uA. Next, update the node-prices u based on the degrees of the 
nodes in the 1-tree found. Find a min-cost 1-tree with respect to the modified 
costs, update the node-prices accordingly, and so on. 

Like Dantzig and Wolfe's method, this method's only dependence on the 
polytope P is via repeatedly optimizing over it. In the case of Held and Karp's 
estimate, optimizing over P amounts to finding a minimum-cost spanning tree. 
Their method of obtaining an estimate for the solution to a discrete-optimization 
problem came to be known as Lagrangean relaxation, and has been applied to 
a variety of other problems. 

Held and Karp's method for finding the optimal or near-optimal solution 
to (@) turns out to be the subgradient method, which dates back to the early 
sixties. Under certain conditions this method can be shown to converge in the 
limit, but, like Dantzig and Wolfe's method it can be rather slow. (One author 
refers to the "the correct combination of artistic expertise and luck" Jll| needed 
to make progress in subgradient optimization.) 



min{ cx : x > 0, Ax > b, x G P} 



(7) 



max ub + min(c — uA)x 

u xeP 



(8) 
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Fractional Packing and Covering 

The third line of research, unlike the first two, provided guaranteed convergence 
rates. Shahrokhi and Matula fl8| gave an approximation algorithm for a special 
case of multicommodity flow. Their algorithm was improved and generalized by 
Klein, Plotkin, Stein, and Tardos |fl3|| , Leighton et al. J^|, and others. Plotkin, 
Shmoys, and Tardos (l6j noticed that the technique could be generalized to apply 
to the problem of finding an element of the set 

{x : Ax < b,x e P} (9) 

where P is a convex set and A is a matrix such that Ax > for every a; € P. In 
particular, as discussed in the introduction, they gave an e-relaxed decision 
procedure that required 0(pe~ 2 log m) queries to P, where p is the width of 
the problem instance. 

A similar result was obtained independently by Grigoriadis and Khachiyan [B . 
Many subsequent algorithms (e.g. |^0|, ||]) built on these results. Furthermore, 
many applications for these results have been proposed. 

This method of Plotkin, Shmoys, Tardos and Grigoriadis, Khachiyan im- 
proves on Dantzig- Wolfe decomposition and subgradient optimization in that it 
does not require artistry to achieve convergence, and it is effective for reasonably 
large values of e. However, for small e the method is frustratingly slow. Might 
there be an algorithm in the Dantzig- Wolfe model that converges more quickly? 

Our aim in this paper has been to address this question, and to provide 
evidence that the answer is no. However, our lower bound technique is incapable 
of proving a lower bound that is superlinear in m, the number of rows of A. 
The reason is that for any m-row matrix A, there is an m-column submatrix B 
such that V(B) = V(A). This raises the question of whether there is a Dantzig- 
Wolfe-type method that requires a number of iterations polynomial in m but 
subquadratic in 1/e. 
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